Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

نویسندگان

  • Dong Huk Park
  • Lisa Anne Hendricks
  • Zeynep Akata
  • Anna Rohrbach
  • Bernt Schiele
  • Trevor Darrell
  • Marcus Rohrbach
چکیده

Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets to define and evaluate this task, and propose a novel model which can provide joint textual rationale generation and attention visualization. Our datasets define visual and textual justifications of a classification decision for activity recognition tasks (ACT-X) and for visual question answering tasks (VQA-X). We quantitatively show that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision. We also qualitatively show cases where visual explanation is more insightful than textual explanation, and vice versa, supporting our thesis that multimodal explanation models offer significant benefits over unimodal approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

Deep models are the defacto standard in visual decision problems due to their impressive performance on a wide array of visual tasks. On the other hand, their opaqueness has led to a surge of interest in explainable systems. In this work, we emphasize the importance of model explanation in various forms such as visual pointing and textual justification. The lack of data with justification annot...

متن کامل

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

Deep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models ca...

متن کامل

The impact of proactive and reactive focus on form in multimodal settings on EFL learners' comprehension and production of modal auxiliaries

The major objective of this mixed methods research, which considered elements of both quantitative and qualitative research approaches, was to examine the effect of two different types of focus on form instruction, namely proactive and reactive across multimodal vs. traditional input settings on Iranian EFL learners' comprehension and production of modal auxiliaries. The participants of the stu...

متن کامل

Decisions of Value: Going Backstage; Comment on “Contextual Factors Influencing Cost and Quality Decisions in Health and Care: A Structured Evidence Review and Narrative Synthesis”

This commentary expands on two of the key themes briefly raised in the paper involving analysis of the evidence about key contextual influences on decisions of value. The first theme focuses on the need to explore in more detail what is called backstage decision-making looking at how actual decisions are made drawing on evidence from ethnographies about decision-making. These studies point to l...

متن کامل

Towards generation of fluent referring action in a multimodal situations

We have been developing a system that uses natural language in combination with visual information such as pictures and gestures, to generate effective explanations. The experimental system we implemented is for explaining the installation and operation of a telephone with answering machine feature, and simulates instruction dialogues performed by an expert in a face-to-face situation with a te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.08129  شماره 

صفحات  -

تاریخ انتشار 2018